4 research outputs found

    Empirical Optimal Transport between Conditional Distributions

    Full text link
    Given samples from two joint distributions, we consider the problem of Optimal Transportation (OT) between the corresponding distributions conditioned on a common variable. The objective of this work is to estimate the associated transport cost (Wasserstein distance) as well as the transport plan between the conditionals as a function of the conditioned value. Since matching conditional distributions is at the core of supervised training of discriminative models and (implicit) conditional-generative models, OT between conditionals has the potential to be employed in diverse machine learning applications. However, since the conditionals involved in OT are implicitly specified via the joint samples, it is challenging to formulate this problem, especially when (i) the variable conditioned on is continuous and (ii) the marginal of this variable in the two distributions is different. We overcome these challenges by employing a specific kernel MMD (Maximum Mean Discrepancy) based regularizer that ensures the marginals of our conditional transport plan are close to the conditionals specified via the given joint samples. Under mild conditions, we prove that our estimator for this regularized transport cost is statistically consistent and derive finite-sample bounds on the estimation error. Application-specific details for parameterizing our conditional transport plan are also presented. Furthermore, we empirically evaluate our methodology on benchmark datasets in applications like classification, prompt learning for few-shot classification, and conditional-generation in the context of predicting cell responses to cancer treatment

    Learning Algorithms using Chance-Constrained Programs

    No full text
    This thesis explores Chance-Constrained Programming (CCP) in the context of learning. It is shown that chance-constraint approaches lead to improved algorithms for three important learning problems — classification with specified error rates, large dataset classification and Ordinal Regression (OR). Using moments of training data, the CCPs are posed as Second Order Cone Programs (SOCPs). Novel iterative algorithms for solving the resulting SOCPs are also derived. Borrowing ideas from robust optimization theory, the proposed formulations are made robust to moment estimation errors. A maximum margin classifier with specified false positive and false negative rates is derived. The key idea is to employ chance-constraints for each class which imply that the actual misclassification rates do not exceed the specified. The formulation is applied to the case of biased classification. The problems of large dataset classification and ordinal regression are addressed by deriving formulations which employ chance-constraints for clusters in training data rather than constraints for each data point. Since the number of clusters can be substantially smaller than the number of data points, the resulting formulation size and number of inequalities are very small. Hence the formulations scale well to large datasets. The scalable classification and OR formulations are extended to feature spaces and the kernelized duals turn out to be instances of SOCPs with a single cone constraint. Exploiting this speciality, fast iterative solvers which outperform generic SOCP solvers, are proposed. Compared to state-of-the-art learners, the proposed algorithms achieve a speed up as high as 10000 times, when the specialized SOCP solvers are employed. The proposed formulations involve second order moments of data and hence are susceptible to moment estimation errors. A generic way of making the formulations robust to such estimation errors is illustrated. Two novel confidence sets for moments are derived and it is shown that when either of the confidence sets are employed, the robust formulations also yield SOCPs

    Robust formulations for clustering-based large-scale classification

    No full text
    Chebyshev-inequality-based convex relaxations of Chance-Constrained Programs (CCPs) are shown to be useful for learning classifiers on massive datasets. In particular, an algorithm that integrates efficient clustering procedures and CCP approaches for computing classifiers on large datasets is proposed. The key idea is to identify high density regions or clusters from individual class conditional densities and then use a CCP formulation to learn a classifier on the clusters. The CCP formulation ensures that most of the data points in a cluster are correctly classified by employing a Chebyshev-inequality-based convex relaxation. This relaxation is heavily dependent on the second-order statistics. However, this formulation and in general such relaxations that depend on the second-order moments are susceptible to moment estimation errors. One of the contributions of the paper is to propose several formulations that are robust to such errors. In particular a generic way of making such formulations robust to moment estimation errors is illustrated using two novel confidence sets. An important contribution is to show that when either of the confidence sets is employed, for the special case of a spherical normal distribution of clusters, the robust variant of the formulation can be posed as a second-order cone program. Empirical results show that the robust formulations achieve accuracies comparable to that with true moments, even when moment estimates are erroneous. Results also illustrate the benefits of employing the proposed methodology for robust classification of large-scale datasets

    Variable sparsity kernel learning

    No full text
    This paper 1 presents novel algorithms and applications for a particular class of mixed-norm regularization based Multiple Kernel Learning (MKL) formulations. The formulations assume that the given kernels are grouped and employ l1 norm regularization for promoting sparsity within RKHS norms of each group and ls,s≥2 norm regularization for promoting non-sparse combinations across groups. Various sparsity levels in combining the kernels can be achieved by varying the grouping of kernels—hence we name the formulations as Variable Sparsity Kernel Learning (VSKL) formulations. While previous attempts have a non-convex formulation, here we present a convex formulation which admits efficient Mirror-Descent (MD) based solving techniques. The proposed MD based algorithm optimizes over product of simplices and has a computational complexity of O ( m 2 ntot lognmax/ε 2) where m is no. training data points, nmax,ntot are the maximum no. kernels in any group, total no. kernels respectively and ε is the error in approximating the objective. A detailed proof of convergence of the algorithm is also presented. Experimental results show that the VSKL formulations are well-suited for multi-modal learning tasks like object categorization. Results also show that the MD based algorithm outperforms state-of-the-art MKL solvers in term
    corecore